Distributed representation-based spoken word sense induction
نویسندگان
چکیده
Spoken Term Detection (STD) or Keyword Search (KWS) techniques can locate keyword instances but do not differentiate between meanings. Spoken Word Sense Induction (SWSI) differentiates target instances by clustering according to context, providing a more useful result. In this paper we present a fully unsupervised SWSI approach based on distributed representations of spoken utterances. We compare this approach to several others, including the state-of-the-art Hierarchical Dirichlet Process (HDP). To determine how ASR performance affects SWSI, we used three different levels of Word Error Rate (WER), 40%, 20% and 0%; 40% WER is representative of online video, 0% of text. We show that the distributed representation approach outperforms all other approaches, regardless of the WER. Although LDA-based approaches do well on clean data, they degrade significantly with WER. Paradoxically, lower WER does not guarantee better SWSI performance, due to the influence of common locutions.
منابع مشابه
Noun Sense Induction and Disambiguation using Graph-Based Distributional Semantics
We introduce an approach to word sense induction and disambiguation. The method is unsupervised and knowledge-free: sense representations are learned from distributional evidence and subsequently used to disambiguate word instances in context. These sense representations are obtained by clustering dependency-based secondorder similarity networks. We then add features for disambiguation from het...
متن کاملGraph Based Algorithms for Word Sense Induction and Disambiguation
This paper presents a survey of graph based methods for word sense induction and disambiguation. Many areas of Natural Language Processing like Word Sense Disambiguation (WSD), text summarization, keyword extraction make use of Graph based methods. The very idea behind graph based approach is to formulate the problems in graph setting and apply clustering to obtain a set of clusters (senses). T...
متن کاملImproving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering
In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...
متن کاملSense-aware Semantic Analysis: A Multi-prototype Word Representation Model using Wikipedia
Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on W...
متن کاملSense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia
Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on W...
متن کامل